Key innovations and the diversification of Hymenoptera

The order Hymenoptera (wasps, ants, sawflies, and bees) represents one of the most diverse animal lineages, but whether specific key innovations have contributed to its diversification is still unknown. We assembled the largest time-calibrated phylogeny of Hymenoptera to date and investigated the origin and possible correlation of particular morphological and behavioral innovations with diversification in the order: the wasp waist of Apocrita; the stinger of Aculeata; parasitoidism, a specialized form of carnivory; and secondary phytophagy, a reversal to plant-feeding. Here, we show that parasitoidism has been the dominant strategy since the Late Triassic in Hymenoptera, but was not an immediate driver of diversification. Instead, transitions to secondary phytophagy (from parasitoidism) had a major influence on diversification rate in Hymenoptera. Support for the stinger and the wasp waist as key innovations remains equivocal, but these traits may have laid the anatomical and behavioral foundations for adaptations more directly associated with diversification.

The exact sample size (n) for each experimental group/condition, given as as a discrete number and unit of of measurement A statement on on whether measurements were taken from distinct samples or or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of of all covariates tested A description of of any assumptions or or corrections, such as as tests of of normality and adjustment for multiple comparisons A full description of of the statistical parameters including central tendency (e.g. means) or or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or or associated estimates of of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on on the choice of of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of of the appropriate level for tests and full reporting of of outcomes Estimates of of effect sizes (e.g. Cohen's d, Pearson's r), ), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of of computer code Data collection Bonnie B. B. Blaimer Feb 8, 8, 2023 UCE data were processed using published scripts within the PHYLUCE package (version 1.5.0; Faircloth 2016); assembly of of cleaned reads used Trinity (standalone version trinityrnaseq_r20140717). In In PHYLUCE v 1.5.0, we we used wrapper scripts for the following software: Trimming of of demultiplexed FASTQ data output for adapter contamination and low-quality bases used lllumiprocessor v2.0.7, based on on Trimmomatic v0.32-1; alignment of of sequence data for individual UCE loci used MAFFT v7.130b; internal trimming of of misaligned bases used Gblocks v0.91b. UCE sequences for the six non-Hymenoptera outgroup taxa were captured from genome assemblies published on on NCBI (see Supplementary Data 1 for accession numbers), using scripts published within the PHYLUCE package (v1.5.0) and outlined in in the tutorial "harvesting UCEs from genomes" (https://phyluce.readthedocs.io/en/latest/tutorial-three.html). Extraction of of protein-coding loci from captured UCEs followed a custom-pipeline described in in Borowiec (2019) with the associated scripts available at at https://github.com/marekborowiec/uce-to-protein; some steps with this pipeline rely on on BLASTX v2.8.1 (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Data were then aligned with MAFFT v7.130b and internal trimming was performed with Gblocks v0.91b. For further data exploration (see supplementary methods) trimming was performed with TrimALvl.2 and alignment with MUSCLE v3.8.31.

Data analysis
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy

Human research participants
Policy information about studies involving human research participants and Sex and Gender in Research.

Reporting on sex and gender
Population characteristics

Recruitment
Ethics oversight Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Ecological, evolutionary & environmental sciences study design
All studies must disclose on these points even when the disclosure is negative.
We assembled a taxon set of 765 species across 94 extant families belonging to 22 superfamilies within the Hymenoptera (ants, wasps, bees and sawflies), and six non-hymenopteran outgroups (  . This data of 765 hymenopteran species (i.e. 765 samples) was chosen to assemble a balanced taxon set across the Hymenoptera representing all major lineages. We generated UCE sequence data de novo ourselves for most taxa; sequences for 370 taxa are newly released for publication in the context of this study, while 395 sequences have already been published in other studies by some of us. Data for outgroups were assembled by mining UCEs in silica from published genomes downloaded from NCBI. Specific species representing lineages were chosen based on the availability of DNA-grade tissue samples in museum collections and previously published sequences. Samples represent the entire population of the respective species. Male and female specimens have been sampled, without particular preference. Specimens have collecting year ranges from 1905-2017. Information on taxon sampling with NCBI accession numbers and references is presented in Supplementary Data 1.
Our goal was to create a well-sampled, balanced taxon sampling across the order Hymenoptera by combining newly generated sequences with previously published data. Our sampling for newly generated data has an emphasis on the hyperdiverse lineages within the superfamilies Ceraphronoidea, Chalcidoidea, Cynipoidea, lchneumonoidea and Platygastroidea. The specific taxa chosen to represent lineages were selected based on the availability of DNA-grade tissue samples.
RRK, BFS, BBB, MWG, MLB, EJT, IM, DRS, JYR and AC provided insect museum samples, either as pinned dried specimens or preserved in 95-100% Ethanol, or sequence data. No specimens were specifically collected for this study; museum specimens (where information is available) were collected with a variety of methods, including hand collecting/netting, Malaise traps and yellow pan traps. BBB, BFS, JYR and AC carried out DNA extractions, library preparation and target enrichment for UCEs, using protocols summarized in the Supplementary Methods and Data of the article. BBB, BFS and AC assembled and processed UCE data, and BBB performed phylogenetic and macroevolutionary analyses.
The sequence data for this study were collected between January 2017 and September 2018. This time frame was delimited by the timing of available grant funding for the study. The taxa included in this study stem from museum collections representing global sampling efforts.
Sequence data were cleaned and trimmed according to standard protocols to remove contamination and sequencing error. After this step, no data were excluded.
We have provided all data and code used in the study in a Dryad repository. Further, we provided detailed methodology for analyses in the Supplementary Material. We therefore believe that the study is fully reproducible.
We performed phylogenetic analyses based on the taxon sampling and molecular sampling outlined, applying models of nucleotide evolution to infer phylogenetic trees using maximum likelihood methods. Phylogenetic methods are fundamentally different from a standard statistical experimental design that requires randomization.
The type of questions and phylogenetic analyses carried out in this study do not require blinding experiment design, because there are no participants who may be influenced by the treatments.
This study did not involve laboratory animals.
The insects that were used for generating sequence data in this study were originally collected and killed in the wild, and either directly preserved in 70-100% ethanol for tissue preservation or pinned and dried for morphological study. Since the time of their collection, they have been housed in various museum collections, either preserved in 95-100% Ethanol or stored as dried and pinned specimens. This study did not involve any live animals.